Linear Regression¶
Class Reference¶
-
class
pykitml.
LinearRegression
(input_size, output_size, reg_param=0)¶ Implements linear regression.
-
__init__
(input_size, output_size, reg_param=0)¶ Parameters: - input_size (int) – Size of input data or number of input features.
- output_size (int) – Number of categories or groups.
- reg_param (int) – Regularization parameter for the model, also known as ‘weight decay’.
-
feed
(input_data)¶ Accepts input array and feeds it to the model.
Parameters: input_data (numpy.array) – The input to feed the model. Raises: ValueError
– If the input data has invalid dimensions/shape.Note
This function only feeds the input data, to get the output after calling this function use
get_output()
orget_output_onehot()
-
get_output
()¶ Returns the output activations of the model.
Returns: The output activations. Return type: numpy.array
-
train
(training_data, targets, batch_size, epochs, optimizer, testing_data=None, testing_targets=None, testing_freq=1, decay_freq=1)¶ Trains the model on the training data, after training is complete, you can call
plot_performance()
to plot performance graphs.Parameters: - training_data (numpy.array) – numpy array containing training data.
- targets (numpy.array) – numpy array containing training targets, corresponding to the training data.
- batch_size (int) – Number of training examples to use in one epoch, or number of training examples to use to estimate the gradient.
- epochs (int) – Number of epochs the model should be trained for.
- optimizer (any Optimizer object) – See Optimizers
- testing_data (numpy.array) – numpy array containing testing data.
- testing_targets (numpy.array) – numpy array containing testing targets, corresponding to the testing data.
- testing_freq (int) – How frequently the model should be tested, i.e the model will be tested
after every
testing_freq
epochs. You may want to increase this to reduce training time. - decay_freq (int) – How frequently the model should decay the learning rate. The learning rate
will decay after every
decay_freq
epochs.
Raises: ValueError
– Iftraining_data
,targets
,testing_data
ortesting_targets
has invalid dimensions/shape.
-
plot_performance
()¶ Plots logged performance data after training. Should be called after
train()
.Raises:
-
r2score
(testing_data, testing_targets)¶ Return R-squared or coefficient of determination value.
Parameters: - testing_data (numpy.array) – numpy array containing testing data.
- testing_targets (numpy.array) – numpy array containing testing targets, corresponding to the testing data.
Returns: r2score – The average cost of the model over the testing data.
Return type: float
Raises: ValueError
– Iftesting_data
ortesting_targets
has invalid dimensions/shape.
-
cost
(testing_data, testing_targets)¶ Tests the average cost of the model on the testing data passed to the function.
Parameters: - testing_data (numpy.array) – numpy array containing testing data.
- testing_targets (numpy.array) – numpy array containing testing targets, corresponding to the testing data.
Returns: cost – The average cost of the model over the testing data.
Return type: float
Raises: ValueError
– Iftesting_data
ortesting_targets
has invalid dimensions/shape.
-
Example: Predicting Fish Length¶
Dataset
Fish Length - pykitml.datasets.fishlength module
Training Model
import pykitml as pk
from pykitml.datasets import fishlength
# Load the dataset
inputs, outputs = fishlength.load()
# Normalize inputs
array_min, array_max = pk.get_minmax(inputs)
inputs = pk.normalize_minmax(inputs, array_min, array_max)
# Create polynomial features
inputs_poly = pk.polynomial(inputs)
# Normalize outputs
array_min, array_max = pk.get_minmax(outputs)
outputs = pk.normalize_minmax(outputs, array_min, array_max)
# Create model
fish_classifier = pk.LinearRegression(inputs_poly.shape[1], 1)
# Train the model
fish_classifier.train(
training_data=inputs_poly,
targets=outputs,
batch_size=22,
epochs=200,
optimizer=pk.Adam(learning_rate=0.02, decay_rate=0.99),
testing_freq=1,
decay_freq=10
)
# Save model
pk.save(fish_classifier, 'fish_classifier.pkl')
# Plot performance
fish_classifier.plot_performance()
# Print r2 score
print('r2score:', fish_classifier.r2score(inputs_poly, outputs))
Predict length of fish that is 28 days old at 25C
import numpy as np
import pykitml as pk
from pykitml.datasets import fishlength
# Predict length of fish that is 28 days old at 25C
# Load the dataset
inputs, outputs = fishlength.load()
# Load the model
fish_classifier = pk.load('fish_classifier.pkl')
# Normalize inputs
array_min, array_max = pk.get_minmax(inputs)
input_data = pk.normalize_minmax(np.array([28, 25]), array_min, array_max)
# Create plynomial features
input_data_poly = pk.polynomial(input_data)
# Get output
fish_classifier.feed(input_data_poly)
model_output = fish_classifier.get_output()
# Denormalize output
array_min, array_max = pk.get_minmax(outputs)
model_output = pk.denormalize_minmax(model_output, array_min, array_max)
# Print result
print(model_output)
Performance Graph